Skip to content

Conversation

@Kobzol
Copy link
Member

@Kobzol Kobzol commented Oct 23, 2025

Alternative to #144.

This reduces the duration to glob **/*.rs in a rustc checkout on my PC from ~1000ms to ~820ms:

Benchmark 1: ./local
  Time (mean ± σ):     822.0 ms ±  11.2 ms    [User: 272.5 ms, System: 549.2 ms]
  Range (min … max):   811.8 ms … 849.6 ms    10 runs
 
Benchmark 2: ./upstream
  Time (mean ± σ):      1.019 s ±  0.006 s    [User: 0.468 s, System: 0.551 s]
  Range (min … max):    1.005 s …  1.026 s    10 runs
 
Summary
  ./local ran
    1.24 ± 0.02 times faster than ./upstream

@Kobzol Kobzol requested a review from tgross35 October 23, 2025 07:14
Comment on lines +951 to +952
// filename from `DirEntry` here, we store it proactively (we still
// have to heap allocate it).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"still"? afaict this adds an allocation. So it's surprising that allocating an extra OsString is still cheaper than getting the filename from the path later.

If you keep the whole DirEntry you could call file_name_ref on cfg(unix) to avoid that allocation too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit surprising, but the benchmark is quite clear on the results. It's one allocation vs potentially tens/hundreds/thousands of iterations of components, which seemingly isn't cheap, perf.-wise.

file_name_ref is unstable, as far as I can see?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_name_ref is unstable, as far as I can see?

Right, too bad.

Regarding the overhead, maybe wait until my rust PR lands and then benchmark again? The file_name optimization looks like it just might give us that 20% speedup too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, once it gets into nightly, I'll benchmark it against a previous nightly to see if there's a big difference.

@Kobzol
Copy link
Member Author

Kobzol commented Nov 11, 2025

With rust-lang/rust#148084 landed:

  • Upstream (0.3.3)
/pr/pe/ru/gl/globtest [cache-filename]$ hyperfine ./globtest-2025-11-09 ./globtest-2025-11-10
Benchmark 1: ./globtest-2025-11-09
  Time (mean ± σ):     761.6 ms ±   3.6 ms    [User: 359.8 ms, System: 401.6 ms]
  Range (min … max):   756.5 ms … 768.4 ms    10 runs
 
Benchmark 2: ./globtest-2025-11-10
  Time (mean ± σ):     738.1 ms ±   4.2 ms    [User: 342.1 ms, System: 395.8 ms]
  Range (min … max):   734.4 ms … 748.6 ms    10 runs
 
Summary
  ./globtest-2025-11-10 ran
    1.03 ± 0.01 times faster than ./globtest-2025-11-09
  • This PR
/pr/pe/ru/gl/globtest [cache-filename]$ hyperfine ./globtest-2025-11-09 ./globtest-2025-11-10
Benchmark 1: ./globtest-2025-11-09
  Time (mean ± σ):     614.1 ms ±   4.4 ms    [User: 212.6 ms, System: 401.2 ms]
  Range (min … max):   608.1 ms … 621.7 ms    10 runs
 
Benchmark 2: ./globtest-2025-11-10
  Time (mean ± σ):     606.5 ms ±   1.6 ms    [User: 203.3 ms, System: 403.0 ms]
  Range (min … max):   603.8 ms … 609.8 ms    10 runs
 
Summary
  ./globtest-2025-11-10 ran
    1.01 ± 0.01 times faster than ./globtest-2025-11-09

Still seems worth landing (~600ms vs ~730ms).

@the8472
Copy link
Member

the8472 commented Nov 11, 2025

Are those results comparable to the previous ones, i.e. did we get a speedup from the path changes and this PR will provide additional ones on top?

@Kobzol
Copy link
Member Author

Kobzol commented Nov 11, 2025

This should be the difference with/without your PR with 0.3.3 of glob:

Benchmark 1: ./globtest-2025-11-09
  Time (mean ± σ):     761.6 ms ±   3.6 ms    [User: 359.8 ms, System: 401.6 ms]
  Range (min … max):   756.5 ms … 768.4 ms    10 runs
 
Benchmark 2: ./globtest-2025-11-10
  Time (mean ± σ):     738.1 ms ±   4.2 ms    [User: 342.1 ms, System: 395.8 ms]
  Range (min … max):   734.4 ms … 748.6 ms    10 runs

It looks like your changes helped, but this PR provides an additional improvement of around 10-15%.

@the8472
Copy link
Member

the8472 commented Nov 11, 2025

Thanks, it helps arguing the compile time impact. Also not as big as it looked from micro-benchmarks, but so it goes.

@Kobzol Kobzol enabled auto-merge (squash) November 13, 2025 17:02
@Kobzol Kobzol merged commit 0639988 into rust-lang:master Nov 13, 2025
17 checks passed
@Kobzol Kobzol deleted the cache-filename branch November 13, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants